DOCUMENT RESUME 



ED 431 014 



TM 029 846 



AUTHOR 

TITLE 

PUB DATE 
NOTE 



PUB TYPE 
EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Li, Yuan H. ; Ford, Valeria; Tompkins, Leroy J. 

The Construct Validity of a Performance -Based Assessment 
Program . 

1999-04-00 

44p.; Paper presented at the Annual Meeting of the American 
Educational Research Association (Montreal, Quebec, Canada, 
April 19-23, 1999) . 

Reports - Evaluative (142) -- Speeches/Meeting Papers (150) 
MF01/PC02 Plus Postage. 

*Construct Validity; Correlation; Elementary Education; 
Elementary School Students; Grade 3; Grade 5; Multiple 
Choice Tests; ^Performance Based Assessment; State Programs; 
*Structural Equation Models; Testing Programs 
Comprehensive Tests of Basic Skills; *Maryland School 
Performance Assessment Program 



ABSTRACT 



The purpose of this study was to examine the construct 
validity of a performance assessment program, the Maryland School Performance 
Assessment Program (MSPAP) . Based on analyses of the longitudinal 
associations of Grade 5 MSPAP data in 1996 with Grade 3 MSPAP data in 1994, 
the following hypothesis was examined: the unattentuated correlation or the 
group-mean correlation between two similar measures of the same content area 
is higher than its correlations with different content areas. This hypothesis 
was not supported. In addition, the results analyzed by structural ecjuation 
modeling (SEM) of this longitudinal correlation matrix reveal that the SEM 
model specified by the MSPAP six latent traits was unable to capture the 
underlying information of this data. Extra factors, such as a general ability 
and an assessment method effect, may need to be considered for better fitting 
data. SEM was performed on the multitrait-multimethod correlation data, and 
the traits of Reading and Mathematics were assessed by MSPAP and the 
Comprehensive Test of Basic Skills (CTBS) . The trait effects of MSPAP reading 
and CTBS mathematics application may be attenuated by the method effects of 
the performance -based assessment and the multiple-choice assessment, 
respectively. (Contains 6 figures, 7 tables, and 19 references.) (Author/SLD) 



******************************************************************************** 

* Reproductions supplied by EDRS are the best that can be made * 

* from the original document . * 

******************************************************************************** 



O 

ERIC 



TM029846 ed4 



The Construct Validity of a Performance-based Assessment Program 



Yuan H. Li, Valeria Ford, Leroy J. Tompkins 



Prince George's County Public Schools, Maryland 



PERMISSION TO REPRODUCE AND 
DISSEMINATE THIS MATERIAL 
HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 



U.S. DEPARTMENT OF EDUCATION 
Office of Educational Research and Improvement 

Z ATIONAL RESOURCES INFORMATION 
CENTER (ERIC) 

5 document has been reproduced as 
received from the person or organization 
originating it. 

□ Minor changes have been made to 
improve reproduction quality. 



• Points of view or opinions stated in this 
document do not necessarily represent 
official OERI position or policy. 




Paper presented at the annual meeting of the American Educational 
Research Association, April, 19-23, Montreal, Canada 



ERIC 



2 



BEST COPY AVAILABLE 



The Construct Validity of a Performance-based Assessment Program* 

Abstract 

The purpose of this study is to examine the construct validity of a performance 
assessment program, the Maryland School Performance Assessment Program (MSPAP). 

Based on analyses of the longitudinal associations of Grade 5 MSPAP data in 
1996 with Grade 3 MSPAP data in 1994, the following hypothesis was examined: the 
unattenuated correlation or the group-mean correlation between two similar measures of 
the same content area is higher than its correlations with different content areas. This 
hypothesis was not attained. In addition, the results analyzed by structural equation 
modeling (SEM) to this longitudinal correlation matrix reveal that the SEM model 
specified by the MSPAP six latent traits was unable to capture the underlying information 
of this data. Extra factors, such as, a general ability and an assessment method effect, 
may need to be considered for better fitting the data. 

SEM was performed on the multitrait-multimethod correlation data, where the 
traits of Reading and Math was assessed by MSPAP and CTBS (the Comprehensive Test 
of Basic Skills). The trait effects of MSPAP reading and CTBS mathematics application 
may be attenuated by the method effects of the performance-based assessment and the 
multiple-choice assessment, respectively. 



Key Words: Construct Validity; Reliability; Performance-based Assessment; 
Structural Equation Modeling 



* The viewpoints made in this study reflect the authors’ opinions rather than that of the 
Prince George’s County Public Schools, Maryland. 



O 

ERIC 



3 



Construct Validity 



I. Introduction 

The Goals 2000, Educate America Act passed in 1994, specifies that high learning 
standards and innovative forms of assessments should be used as the chief means to ensure that 
educational reform is on the right track. School systems are required to look beyond the 
traditional method of multiple-choice testing for better forms of assessments in evaluating 
students' achievements. Of the many innovative forms of assessments suggested, performance- 
based assessments have been widely adopted. They usually require students to construct a variety 
of responses to test items or tasks that are similar to classroom instructional activities and to 
those used in real life. 

Proponents of the performance-based assessment are of the opinion that all of the real or 
perceived shortcomings of traditional assessments would be remedied by this transition. 
Resulting improvements include more valid measures of student performance, elimination or 
reduction in bias or perceived bias in traditional assessments, etc. In light of some of the issues 
that remain unresolved, more psychometric questions were addressed (literature review by 
Green, 1995). For example, nonstandardized test formats or testing procedures, difficulty in 
maintaining the test specifications, inconsistent scoring rubrics, differences in the raters' severity 
or leniency in scoring, violating underlying principles for modeling test data or equating tests, 
etc. may render the scores as being biased as well as not being comparable from one year to the 
next. 

In a school improvement instructional model that includes a high stakes testing program, 
data resulting from a performance assessment model MUST provide building managers and 
teachers with directions regarding the strengths and weaknesses in their instructional programs. 
The extent to which the data does or does not accurately provide this direction is an indicator of 
the validity in the assessments using such a model. Not withstanding concerns pertaining to 
accountability from an administrative perspective, the extent to which the scores accurately 
reflect where there are strengths and deficiencies in the instructional program is absolutely 
critical. This is because time, effort and substantial resources must be allocated to address those 
areas which are deficient. In a data-driven instructional program, the reliability and validity of 
the data are paramount if schools are to successfully attain the prescribed standards. 



O 

ERIC 



1 



Construct Validity 



The purpose of this study is to examine the construct validity of a performance 
assessment program by analyzing the performance-based test data set in one school district. We 
expect the results from this study to provide valuable assistance to other similar performance- 
based assessment programs. 

Background of a Performance-based Assessment Program 

Since 1991, the Maryland State Department of Education (MSDE) has implemented the 
annual Maryland School Performance Assessment Program (MSPAP) for grades 3,5, and 8 in all 
of its public schools. The MSPAP assessments consist of six content areas, Reading (RDSS), 
Writing (WRSS), Language Usage (LSS), Mathematics (MSS), Social Studies (SSSS) and 
Science (SCSS). MSPAP was an innovative performance-based assessment. The primary focus 
of the information provided from MSPAP assessments is school performance rather than 
individual student performance because of the design of the MSPAP ’s test and its sampling 
design. Performance on the MSPAP has been used to evaluate whether schools meet a 
satisfactory standard that was set by the State Department of Education. Schools that consistently 
do not meet the standard may be managed by an outside organization if their MSPAP 
performance does not improve. It becomes apparent that with MSPAP provided accountability, 
the score report is very important to test practitioners as well as school authorities. 

MSPAP test items (tasks) are integrated both within a content area and across content 
areas so that students have an opportunity to integrate information they have learned (Maryland 
State Department of Education, 1996). To cover the required breadth of learning outcomes in 
limited testing time, three non-parallel test forms per content area were developed and randomly 
assigned to students within a school. ‘Non-paralleT means that the test tasks of the three test 
forms are not completely created from the same domains (a group of learning outcomes). An 
equating design was used for tracking schools’ yearly improvement. Three steps are taken to 
equate MSPAP scales between two years. For easy understanding, an example of equating 
MSPAP 1995 and 1996 scale scores is illustrated below (for details, refer to Maryland State 
Department of Education, 1996) . 

The first step, called “Adjusting Test Form Effect” (refer to Figure 1), is to equate the 
three test forms using the linear equipercentile equating procedure under the assumption that the 
abilities of the three groups taking the three test forms are very similar. The second step, called 




5 



2 



Construct Validity 



“Adjusting Rater Year Effect” (again refer to Figure 1), was taken to adjust for systematic effects 
in rater leniency or strictness in both year cohorts. In this equating, about 1,500 Answer Books 
per grade from the 1995 MSPAP administration were re-scored by some of the 1996 raters who 
were trained to re-score students’ 1995 responses using the Scoring Guides developed for the 

1995 MSPAP. Estimation of the rater effects was analyzed separately by content for each grade. 
The first set of scale scores (95SS„) was based upon the ratings that the students had received 
from 1995 raters. The second set of scale scores ( 95 SS 96 ) was based on the ratings that these 
students received from the 1996 raters. Both sets were expressed on the metric used for 1995 
scale scores. Linear equipercentile equating procedures were used to estimate the transformation 
coefficients for rater year effect, which were used to transform the metric of 95 SS 96 into that of 
95SS95. 

The third step, called “Adjusting Yearly Test Version Effect” (see Figure 1), adjusted for 
systematic effects in test difficulty, which can be different in two year cohorts. This step was to 
identify a group of students in each grade who took the 1996 MSPAP and were equivalent to the 

1996 group of students administered the 1995 MSPAP test. Linear equipercentile equating 
procedures were used to estimate the transformation coefficients for yearly test version effect, 
which were used for aligning the metric of MSPAP 1996 with the metric of MSPAP 1995. 
Finally, the MSPAP 1996 scale scores were transformed to the metric of the MSPAP 1995 scale 
score, using the transformation coefficients of test form effect, rater year effect as well as yearly 
test version effect. 

While many statistical assumptions made for scaling (e.g. unidimensionality) and for test 
equating are unlikely to be exactly true in practice, especially for a performance-based program. 
Besides that, this type of assessment may encounter other practical problems, pointed out 
previously, even though efforts were made to avoid them. 

[Insert Figure 1 here] 



Construct Validity 



II. Overview of Statistical Procedures 

A. Association between the Performance-based and Multiple-choice Assessments 

The degree of association between a performance-based and multiple-choice assessments 
is often used as a means to evaluate a new performance-based assessment program. However, 
test practitioners have faced a dilemma in interpreting the results generated from this type of 
statistical analysis. Do we expect to obtain high correlation between the two measures? High 
correlation might be a good indicator of validity. For instance, Yen (1998) investigated how 
CTBS5(Comprehensive Tests of Basic Skills) scores from the previous grade related to MSPAP 
proficiency. For Grade 2 students who took CTBS/5 reading and mathematics and who were 
rated at the level of "Proficient", 65 and 64 percent of these students one-year later were rated 
proficient on MSPAP reading and mathematics, respectively. However, when the performance- 
based assessment is strongly associated with what the multiple-choice assessment intends to 
measure, a question of the need for the time-consuming method of the performance-based 
assessment to assess student achievement can be raised. 

On the other hand, do we expect to obtain a result with low correlation between two 
measures? Low correlation might be an indicator of the unique characteristics of the 
performance-based assessment as compared to the multiple-choice assessment. However, low 
correlation would be cause for concern, because the validity of the performance-based model 
would be called into question. 

A more sophisticated approach to investigate the association between two measures is 
known as multitrait-multimethod (MTMM). It was developed by Campbell and Fiske (1959). 
This model includes four types of correlation in the following order of their results from largest 
to smallest (Nunnally & Bernstein, 1994): 

(1) . The correlation (reliability) between the same trait scores measured by similar methods. 

(2) . The correlation (validity) between the same trait scores measured by different methods. 

(3) . The correlation between two different trait scores measured by similar methods. 

(4) . The correlation between two different trait scores measured by different methods. 




4 7 



Construct Validity 



Schatz (1998) applied the MTMM approach to examine the reliability-validity 
coefficients for reading and mathematics achievement scores. Each content was assessed by two 
multiple-choice measures, CTBS/4 and a CRT(Criterion Referenced Test) and by one 
performance-based assessment, MSPAP. The expected order of correlation coefficients was 
found for the content area of Mathematics at three grade levels. Grade 3, 5 and 8. The validity 
coefficients for the content area of Reading did not fit the expected pattern at any of the three 
grade levels. Was this problem caused by the performance-based assessment or by the multiple- 
choice assessment? The answer to this question based on the analysis of MTMM correlation was 
unclear. In addition to that, visual inspection for assessment of construct validity data in a 
correlation matrix can be problematic because of measurement and sampling errors. 

Using the degree of association between two different types of assessment models to 
evaluate the construct validity of the performance-based assessment, researchers generally 
encounter problems in reaching a conclusion. The structural equation modeling (SEM) (for 
literature review, see Schmitt & Stults,1986) may relieve part of the above problem. It is capable of 
further partitioning the variance of each content measure into three components: specific trait; 
assessment method; and random error. The comparisons among the magnitudes of the three 
components for each measure is another criteria to evaluate the construct validity of the 
performance-based or multiple-choice assessment. More technical details will be illustrated in 
the section of Methodology. 

B. Longitudinal Association between two Performance-based Measures 

An alternative to evaluate a performance-based assessment program is the longitudinal 
association techmque between two performance-based measures; for instance, test scores for 
students who had multiple-subject scores on two performance-based assessments when they are 
in a current grade and in a previous-year grade. An intercorrelation analysis is performed. One 
might expect that the correlation between two performance-based measures of the same content 
area should be higher than its correlations with different content areas when the measure errors 
are appropriately taken into control. This type of analysis does not depend on different types of 
measures, so that the correlation obtained from this analysis is much easier to interpret than that 
from the association between two different-type measures.. 






ERIC 



5 



8 



Construct Validity 



III. Methodology 

A. Longitudinal Associations of Grade 3 MSPAP with Grade 5 MSPAP 
1 . Data Description and Sample Size 

Test scores for students who had six content area scores on both MSPAP measures when 
they were in third grade in 1994 and in fifth grade in 1996 were collected from the Prince 
George’s County school district. Approximately 5,500 students’ samples were available. 



2. Data Analysis and Evaluation 

The analysis of the intercorrelations among students’ performance on the two time- 
period measures in six content areas was performed. The sampling error is minor, due to the 
relatively large sample size. However, the measurement errors (unreliability) of two measures, 
particularly in the performance-based measure, can not be avoided and will cause correlation 
attenuation (Lord, 1980). 

A correction for attenuation can be obtained by computing the true-score (without 
measurement error) relationship between two tests. Techmcally, creating factors with only a 
single measured indicator variable is a tool to approximate the true-score correlation when 
structural equation modeling is applied. Consider the diagram in Figure 2. TRD96 and TWR96 
represent the constructs underlying observed variables of MSPAP reading in 1996 (RD96) and 
MSPAP writing in 1996 (WR96), respectively. The corresponding error variances of the 
standardized-scale variable RD96 and WR96 can be approximated by 1 -Reliability Coefficient. 
These values of error variances were fixed while estimating the correlation between TRD96 and 
TRD94. The internal reliability coefficient of Cronbach's alpha was available from the MSPAP 
technical report and used for approximating the error variance. Similar principles are applied to 
compute any pair of true-score correlation of any two tests. In essence, the true-score correlation 
of two measures depend on trustworthy reliability information. For the rest of the figures 
presented in this study, the rectangles and circles denote the observed variables and latent factors, 
respectively. The labels of RD, WR, LS, MS, SS and SC stand for the MSPAP reading, writing, 
language usage, mathematics, social study, and science, respectively. The numbers 96 and 94 
denote the year. The symbols “E” and “D” represent the error term for the observed variable and 



Construct Validity 



residual term for the latent variable, respectively. The SEM computer program, EQS, (Bentler, 
1995) was used to estimate the SEM parameters of interest. 

Another alternative to minimize the effect of measurement error on estimating the 
intercorrelations between two measures is to use the school-based scores (school mean) instead 
of individual students' scores that were unreliable measures according to the MSPAP test 
construction as illustrated in the MSPAP technical report. This school-based correlation analysis 
is particularly meaningful for MSPAP. 

Based on the above longitudinal association analyses, the following hypothesis was 
examined: the adjusted or group-mean correlation between two similar measures of the same 
content area should be higher than its correlations with different content areas (Nunnally & 
Bernstein, 1994). 

[Insert Figure 2 here] 

An exploratory factor analysis was explored, for instance, for the Grade 5 MSPAP data in 
1996. In addition to that, structural equation modeling was conducted to attempt to partition the 
variance of each content area measure of MSPAP into the components; specific trait, 
measurement method and error term. Several specific SEM models are illustrated below. Model 
comparisons were performed to explore which model was better in terms of data-model fit. It is 
important to note that our model comparisons were by no means exhaustive. Other models may 
be of interest. 

Model LI: Six Correlated Latent Traits 

A model for the unadjusted intercorrelations in Table 1 is represented by the path diagram 
shown in Figure 3. Six latent variables representing the true scores on the six traits are 
postulated. For instance, the latent trait of READING is supposed to be measured by RD96 and 
RD94. In addition, these six latent traits are intercorrelated. Each observed measurement is 
assumed to be determined by a trait and an error term. The variance of the error term of the 
standardized-scale variable is constrained by 1- Reliability Coefficient, where the reliability 
coefficient is obtained as Cronbach's alpha value. The assumption behind this model is that the 
six intercorrelated latent traits and their corresponding measurement errors are capable of 
explaining the intercorrelation matrix being analyzed. 

[Insert Figure 3 here] 




7 



10 



Construct Validity 



Model L2: A Second-order Trait Model 

Another model for the intercorrelations described previously is represented by the path 
diagram shown in Figure 4. Since the magnitudes of correlation among the six lower-order 
factors (latent traits) specified in Model LI were relatively high, a higher-order factor (Labeling 
Second-Order F) rather than the correlation of these six traits among themselves was 
hypothesized to account for this correlation matrix. The variance of the error term was 
constrained by the method described previously. The similarity between this model and Model 
LI is that only traits and error terms were specified in the model. In contrast, the models 
described below will include the Method Effects into the model. We tried to incorporate the 
method effects into Model LI. Unfortunately, the problem of linear dependence on some 
parameter estimates (refer to Bentler, 1995) was encountered. Accordingly, the model of L2 
serves as the base line against which an alternative model. Model L3 presented below, is 
compared. 

[Insert Figure 4 here] 

Model L3: A Second-order Trait and Method Effects 

Model L3 (see Figure 5) was formed by adding the Method effects into model M2. 

Model M2 is nested within Model L3. It is hypothesized that the six content measures from 1996 
data reflect 1996 Method Effect (PAM96) and the six content measures from 1994 data reflect 
1994 Method Effect (PAM94). Model comparison between this model and Model L2 was 
conducted to explore whether the Method Effects can significantly improve in fitting the data. 

[Insert Figure 5 here] 

Model L4: Modified Model L3 by freeing Several Error Variances 

In order to improve the model-data fit several variances of error terms and a covariance of 
residual for the second-order factor analysis were set free to be estimated. They are specified in 
Table 4. 




8 11 



Construct Validity 



B. Multitrait-multimethod Associations of MSPAP, CTBS and OLSAT 

1 . Data Description and Sample Size 

Students in third grade in 1996 had six content area scores of MSPAP, three content area 
scores (Reading Vocabulary Scale, RVS, Reading Comprehension Scale, RCS and Math 
Application Scale, MAS) of the CTBS and the Otis Lennon School Abilities (OLSAT). The 
CTBS and OLSAT are multiple-choice format instruments. The sample size is about 7,000. 

2. Data Analysis and Evaluation 

An intercorrelation analysis was conducted for the six content area scores of MSPAP, 
three content area scores of CTBS, and the OLSAT score. Similar correlation analyses were 
conducted using school-based mean statistics. Regarding the intercorrelation matrix in Table 5 
(MSPAPRD, MSPAPMS, CTBSRVS, CTBSRCS and CTBSMAS), structural equation 
modeling was conducted. Four specific SEM models are illustrated below. Hypothesis test and fit 
indices are used to evaluate whether modes are attainable. Besides that, a test in difference chi- 
square values between two nested models is used to evaluate which model is capable of 
capturing the data. Finally, decomposing the variance of the reading or mathematics measures 
into the components: specific trait, measurement method effect, and error term, can be used to 
evaluate whether the assessment method effects attenuate the trait effect. 

Model Ml: Correlated T.atent Traits and Correlated Method Effects 

A base line model for the intercorrelation matrix is represented by the path diagram 
shown in Figure 6. Two correlated trait factors and two correlated method-effect factors are 
hypothesized to underline the correlation matrix. Specifically, it is hypothesized that latent trait 
of READING is measured by MSPAPRD (MSPAP reading), CTBSRVS (CTBS reading 
vocabulary) and CTBSRCS (CTBS reading comprehension). MSPAPMS (MSPAP 
mathematics) and CTBSMAS (CTBS mathematics application) are hypothesized to be indicators 
of another latent variable of MATH. It is hypothesized that MSPAPRD and MSPAPMS reflect 
Method of Performance-based assessment (called MSPAP) and CTBSRVS, CTBSRCS and 
CTBSMAS reflect Method of Multiple-choice assessment (called CTBS). This model serves as 
the base line against which an alternative model presented below is compared. It is typically the 
least restrictive model. The variances of the error terms for MSPAPRD, MSPAPMS and 



Construct Validity 



CTBSMAS were constrained by the method described previously. The error term variance for 
the CTBSMAS was unavailable and was approximated (set to .20) in order to gain 1 degree of 
freedom. The variances of the error terms for CTBSRVS and CTBSRCS were free to be 
estimated since no reliability information was available for these two measures. 

[Insert Figure 6 here] 

Model M2: No Traits and Correlated Method Effects 

Model M2 is nested within Model Ml. No trait factors were specified in the model. 

Model M3: Perfectly Correlated Traits and Correlated Method Effects 

Model M3 was formed by fixing the correlation between two trait factors to 1 .0 in 
Model Ml. 

Model M4: Correlated Traits and Perfectly Correlated Method Effects 

Model M4 was formed by fixing the correlation between two method factors to 1.0 in 
Model Ml. 

Using Widaman’s (1985) paradigm, the evidence of convergent validity can be tested by 
comparing a model in which traits are specified (Model Ml) with one in which they are not 
(Model M2). A test of difference in chi-square values between the two models was conducted. A 
more specific assessment of the convergent validity can be ascertained by examining the variance 
components on each measure due to trait, method and error. Further scrutiny of the variance 
components might detect the likelihood for method effects to attenuate the trait effects. 

In testing for evidence of discriminant validity among traits, a comparison between a 
model in which traits correlated freely (Model Ml) with one in which they are perfectly 
correlated (Model M3) was made. A test of the difference in chi-square values between two 
models was conducted to evaluate the discriminant validity of traits. 

The same logic, as noted earlier, was used to evaluate the evidence of discriminant 
validity among methods. A model in which method factors were freely correlated (Model Ml) 
was compared with one in which they are perfectly correlated (Model M4). A test of the 
difference in chi-square values between two models was conducted to evaluate the evidence of 



ERIC 



10 



13 



Construct Validity 



discrimman. validity of the method factor. Finally, we remind readers that our model 
comparisons were by no means exhausttve. Other alternative models may be of tnterest. 

IV. Results and Discussions 

A. Analyses of Longitudinal Associations of Grade 3 MSPAP with Grade 5 MSPAP 

I Three Types of Intercorrelation Matrix 

The results of correlation analysis for students who had six content area scores on hot 
MSPAP measures when they were in third grade in 1994 and tn ftfth grade in 1996 are presented 
in Table 1. As illustrated in the section on methodology, three types of correlatton analysts were 
performed. The unadjusted correlation (labeled as UnAdj) is presented m the first ro 
cell The correction correlation for attenuation (labeled as Adj) is presented in the second row tn 
each cell. The correlation calculated from school-mean (labeled as Group) is presented m t e 

third row in each cell. 

[Insert Table 1 here] 

The values underlined represent the reliability coefficients which reflect the underlying- 
trait true correlations between the two same content measures across two years. Stmtlrfy, the 
values shown m bold-font represent the rel.ab.lity coeffic.ents based on school mean. The va ues 
in off-dtagonal within the thick black borders represent the correlahons of wo measures o 
different content areas beween MSPAP 1994 and 1996. One might expect that the correlatton 
beween Wo measures of the same content area should be h.gher than tts correlahons w.th 
dtfferen. content areas. Unformnately, dus was not the case for all the content areas. For . stance, 
the adjusted correlation beween Read96 and Read94 was 0.620, which was smaller than e 
correlations of Read96 with Social Shtdy94 (.650). Soc.al Shtdy96 (.657) and Sc.ence96 (.667). 
The hypothests made tn fins shtdy ts no. well held in the test data examtned. Two guest, ons are 

ratsedlordtng to these results, one guestion ts: can thts result be generaltzed to the es. 

other school disttics or the whole state? This question can be appropnatelyexamme y ^ 

analyzing the longitudinal data collected from the whole state school distnct. Anot er ques ton 
■ Can this assumption be retained when the multiple-choice assessment program (for mstan 
CTBS multiple-subject assessments) is applied? A future study of the longt.udmal assoc.a tons 



111 4 



Construct Validity 



of the multiple-choice assessments should be conducted to serve as a base for comparisons with 
this study. Practically, if the answer to the latter question is “NO”, one might wonder whether 
the hypothesis made in this study is unpractical for the multiple-subject assessment program. 
Meanwhile, the fact that the MSPAP test data being analyzed in this study violated this 
assumption becomes less serious than we originally thought. However, if the answer is YES , 
the search for the reasons, for instance, scaling or test equating issues on MSPAP, will become 
critical. 

2. Exploratory Factor Analysis 

Further factor analysis on the correlation matrix of the set of six content area scores, for 
instance. Grade 5 MSPAP data in 1996, was conducted. It turned out that approximately 72 
percent of the variance-covariance of these six content area scores was accounted by one latent 
trait. One possible reason for this finding is that the factor of the MSPAP test tasks being 
integrated both within a content area and across content areas may capture most of the common 
variance among the six content scale scores. 

Another possible reason is that this common variance may account for a general ability 
(Cronbach, 1970). Accordingly, the estimate of the proportion of the unique variance for each 
content-area measure will be a valuable index to reflect the efficacy of a specific content-area 
measure. The proportion of unique variance for each content area can be estimated by subtracting 
the proportion of error variance from corresponding proportion of unexplained variance (or 
unique and error variances) that equals one minus the value of commumty. The estimated 
proportion of error variance for each content test can be approximated by one minus the 
corresponding coefficient Alpha (from MSPAP 1996 technical report). Finally, the proportions 
of unique variance for Reading (0.05), Writing (0.09), Social Studies (0.06) and Science (0.09) 
are very low (see Table 2). 

[Insert Table 2 here] 

The finding from the exploratory factor analysis is not consistent with results from 
literature on factor analysis studies, in which the verbal oriented ability tests such as Reading, 
Writing and Language usage and math oriented ability tests such as Math- and Science are usually 
separately factored by two different underlying traits. Further analyses using structural equation 
modeling (Bollen, 1989) were conducted and will be presented below. 





Construct Validity 



3. Hypothesis tests for Models LI to L4 and Model Comparisons 

The hypothesis tests for Models LI to L4 are presented in Table 3. Models LI and L2 
poorly fit the data in terms of fit indices. Both hypotheses (made in Model LI : the six correlated 
traits themselves are capable of accounting for the correlation matrix being analyzed, and made 
in Model L2: a higher-order trait can capture the intercorrelations among the six traits) are not 
attainable. However, if the method effects were added into the model L2, a significant increase in 
data-model fit was found in Model L3 (see Table 3). Hu and Bentler (1999) recommended joint 
criteria to retain a model, such as (CFI>=.96 and SRMR <=.10) or (RMSEA<=.06 and SRMR 
<=.10). Model L4, freeing some error term variances in Model L3, is retained because it meets 
any of these two joint criteria. 

[Insert Table 3 here] 

According to Model L4, the variance components due to Trait, Method Effect, and Error 
for the first-order Factor Analysis is presented in Table 4. The method effects play a substantive 
role in accounting for the variance of the six latent traits. Similar variance components due to the 
second-order factor and residuals for the second order Factor Analysis is presented in Table 4. 
The latent traits of Reading, Math, Social Study and Science had very high loadings on the 
higher-order factor. 

[Insert Table 4 here] 

B. Multitrait-multimethdTd Associations of MSPAP, CTBS and OLSAT 

1. Two Types of Intercorrelations 

The intercorrelation analyses among MSPAP, CTBS and OLSAT measures for 1996 test 
data were carried out and their results are presented in Table 5. As noted in the section of 
methodology, two types of correlation analysis were performed. The unadjusted correlation 
(labeled as UnAdj) is presented in the first row in each cell. The correlation calculated from 
school-mean (labeled as Group) is presented in the second row in each cell. For the reading 
measure of MSPAP, it is almost equally correlated with the reading vocabulary, reading 
comprehension and math application of the CTBS test and the general ability measure of 
OLSAT. A similar finding was found for the mathematics measure of MSPAP. 

[Insert Table 5 here] 




13 



Construct Validity 



2. Hypothesis Tests for Models Ml to M4 and Model Comparisons 

The chi-square value for the hypothesized model Ml (Correlated Traits and Correlated 
Methods) is .21 (see Table 6). The corresponding type I error is .645, indicating that this model 
fit data. In addition, this model is attained according to the two joint criteria (Hu & Bentler, 

1999). 

The chi-square value and the goodness-of-fit statistics for the Model M2 (No Traits and 
Correlated Methods) are presented in Table 6. As indicated by the chi-square and the fit indices, 
the goodness of fit for Model M2 was poor. 

[Insert Table 6 here] 

The evidence of convergent validity was tested by comparing a model in which traits are 
specified (Model Ml) with one in which they are not (Model M2) using Widaman’s (1985) 
paradigm. A significant difference in chi-square values between the two models supports 
evidence of convergent validity as happened here (see Table 6). A more specific assessment of 
the convergent validity can be ascertained by examining the variance components on each 
measure due to trait, method and error (see Table 7). Further scrutiny of the variance components 
reveals the likelihood for method effects to attenuate the trait effects. For instance, the Method 
effect might play a substantive role in accounting for the variance of MSPAP reading. This result 
might help us interpret the finding from Schatz’s (1998) study, that the validity coefficients for 
the content area of Reading did not fit in the expected order in correlation coefficients, illustrated 
in the section on literature review. The trait effect of the CTBS mathematics application was also 
attenuated by the multiple-choice assessment method. The results from the variance component 
analysis seem to imply that either the performance-based assessment or the multiple-choice 
assessment can attenuate the trait effects. 

[Insert Table 7 here] 

The chi-square value and the goodness-of-fit statistics for Model M3 ( Perfectly 
Correlated Traits and Correlated Methods) are presented in Table 6. We see that the fit of this 
model is fairly good, albeit slightly less well fitting than for Model Ml. In testing for evidence of 




14 17 



Construct Validity 



discriminant validity among traits, a significant difference in chi-square values between Model 1 
and Model 3 was found (see Table 6) to support evidence of discriminant validity of traits. 

The chi-square value and the goodness-of-fit statistics for the Model M4 (Correlated 
Traits and Perfectly Correlated Methods) are presented in Table 6. The fit of this model is almost 
as good as Model M3, albeit slightly less well fitting than for Model Ml. In testing for evidence 
of discriminant validity among methods, we applied the same logic as noted earlier. A significant 
difference in chi-square values between these two models of Ml and M4 was found (see Table 6) 
to support evidence of discriminant validity of the method factor. 



18 

er|c 



15 



Construct Validity 



V. Summary and Conclusion 

The primary concern of this study was to examine the construct validity of MSPAP by 
means of analyzing the performance-based test data set in one school district. Based on the 
analyses of the longitudinal associations of Grade 5 MSPAP data in 1996 with Grade 3 MSPAP 
data in 1994, the following hypothesis was examined: the unattenuated correlation or the group- 
mean correlation between two similar measures of the same content area is higher than its 
correlations with different content areas. This hypothesis was not attained. Although this finding 
might threaten the construct validity of MSPAP and bring the broad question of whether the 
content- area scores obtained on MSPAP reflect the efficacy of the instructional programs 
delivered in schools, school districts, and the state, we would be very prudent not to prejudge this 
issue because of two questions associated with this finding. The questions are: (1). Can this result 
be generalized to the test data of other school districts or the whole state?; and (2). Can this 
assumption be retained when the multiple-choice assessment program (for instance CTBS 
multiple-subject assessments) is applied? These two questions need clarification at some future 
time. 

In addition, the results analyzed by structural equation modeling (SEM) to this longitudinal 
correlation matrix reveal that the SEM model specified by the MSPAP six latent traits was unable to 
capture the underlying information of this data. Extra factors, such as, a general ability and an 
assessment method effect, may need to be considered for better fitting the data. This result seems to 
imply that what we have observed in MSPAP data is more general measures of student ability than their 
performance in any given content area. 

The results from structural equation modeling to the multitrait-multimethod correlation 
data suggest that the trait effect of MSPAP reading may be attenuated by the method effect of the 
performance-based assessment . Similarly, the trait effect of CTBS mathematics application may 
also be attenuated by the method effect of the multiple-choice assessment. These phenomena of 
trait effects attenuation by assessment methods can happen in either performance-based or 
multiple-choice assessment. The issue of whether these findings can be generalized to other 
MSPAP data is worthwhile to investigate at some future time. 





Construct Validity 



The primary rationale for moving away from the multiple-choice assessment to the 
performance-based assessment comes from a strong belief that student “assessment needs to 
mirror instruction and high-quality learning activities” (p 53, Linn, 1995). This movement is 
motivated primarily by instructional rather than psychometric considerations. The current 
psychometric techniques such as test equating and scaling that have predominately been used for 
the multiple-choice assessment for a long time may not completely suitable to this new 
assessment movement. New psychometric techniques such as multidimensional scaling 
(Ackerman, 1994; Reckase, 1997) and equating (Li & Lissitz, in press) will serve as an important 
tool to quantify this type of assessment when at some future time these new statistical techniques 
resolve the problems they are facing. 



Construct Validity 



References 

Ackerman, T. A. (1994). Using multidimensional item response theory to understand what items 
and tests are measuring. Applied Measurement in Education, 4, 255-278. 

Bentler P. M. (1995). EQS Structural equations program manual. Encino, CA; Multivariate 
Software, Inc. 

Bollen, K. A. (1989). Structural equations with latent variables. New York: A Wiley-Interscience 
Publication. 

Campbell, D. T. & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait- 
multimethod matrix. Psychological Bulletin, 56, 81-105. 

Cronbach, L. J. (1970). Essentials of psychological testing. New York: Harper & Row Publishers, 

Inc. 

Green, B. F. (1995). Comparability of scores from performance assessments. Educational 
Measurement: Issues and Practice, Winter, 13-15. 

Hu, L.& Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: 

Conventional criteria versus new alternatives. Structural Equation Modeling: A multidisciplinary 
Journal, 6, 1-55. 

Li, Y. H. & Lissitz, R. W. (in press). An evaluation of multidimensional IRT equating 

methods by assessing the accuracy of transforming parameters onto a target test metric. 
Applied Psychological Measurement. 

Linn, R. L. (1995). High-stakes uses of performance-based assessments: Rationale, examples, and 

problems of comparability. In T. Oakland & R. K. Hambleton (Ed.), International perspectives on 
academic assessment (pp. 49-73).Norwell, MA. Kluwer Adademic Publishers. 

Lord, F. M. (1980). Applications of item response theory to practical testing problems. New Jersey: 
Lawrence Erlbaum Associates, Inc. 

Maryland State Department of Education. (1996). Technical report: 1996 Maryland School 
Performance Assessment Program. Baltimore: Author. 

Nunnally, J. C. & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill, Inc. 

Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied 
Psychological Measurement. 21, 25-36. 

Schatz, C. J. (1998, November). Convergent-discriminant validity evidence for the MSPAP Reading 
and Math scores. Paper presented at the annual meeting of the Maryland Assessment Group, 
Ocean City, MD. 




21 

18 



Construct Validity 



Schmitt, N. & Stults, D. M. (1986). Methodology review: Analysis of multitrait-multimethod 
matrices. Applied Psychological Measurement, 10, 1-22. 

Widaman, K. F. (1985). Hierachically tested covariance structure models for mulrait-multimethod 
data. Applied Psychological Measurement, 9, 1-26. 

Yen, W. M. & Ferrara, S. (1997). The Maryland school performance assessment program: 

Performance assessmetn with psychometric quality suitable for high stake usage. Educational and 
Psychological Measurement, 57, 60-84. 

Yen, W. M. & Ferrara, S. (1997). The technical quality of performance assessments: Standard errors 
of percents of pupils reading standards. Educational Measurement: Issues and Practice, Fall, 5- 
15. 

Yen, W. M. & Julian, M. W. (1998, November). How CTBS/5 scores from the previous grade relate 
to MSPAP proficiency. Paper presented at the annual meeting of the Maryland Assessment 
Group, Ocean City, MD. 




22 

19 



Construct Validity 



Figure Headings 

Figure 1. The Equating Design Used for Equating MSPAP 1995 and 1996 Scale Scores 
Figure 2: A SEM model for Computing the Intercorrelations among the Twelve True 
. Scores of MSPAP 

Figure 3: A SEM Model: Six Correlated Latent Traits for a MSPAP Longitudinal 
Associations Data 

Figure 4: A SEM Model: A Second-order Factor for a MSPAP Longitudinal 
Associations Data 

Figure 5: A SEM Model: A Second-order Factor and Method Effects for a MSPAP 
Longitudinal Associations Data 

Figure 6: A Hypothesized Multitrait-multimethod Model for the MSPAP-CTBS 
Correlation Data 





20 



Equate MSPAP 1995 and 1996 Scale Scores 



I. Adjusting the Version Effect of the Three Test Forms 
. Using Equivalent Group Design 

. Using the Linear Equipercentile Equating Procedure to Obtain the 
transformation coefficients for the Version Effect 



II. Adjusting Rater Year Effect between Two Years 




III. Adjusting Test Version Effect between Two Years 




1. The Equating Design Used for Equating MSPAP 1995 and 1996 Scale Scores 



E1 



RD96 



TRD96 



E2 






Figure 2: A SEM model for Computing the Intercorrelations among the Twelve True 
Scores of MSPAP 




25 




El 



E2 



E3 



E5 



E6 



E7 



E8 



E9 



E10 



E11 



E12 



O 

ERIC 



Figure 3: A SEM Model: Six Correlated Latent Traits for a MSPAP Longitudinal 
Associations Data ^ 

ch 



E1 




E2 



E3 



E4 



E5 



E6 



E7 



E8 



E9 



E10 



E11 



E12 




Figure 4: A SEM Model: A Second-order Factor for a MSPAP Longitudinal 
Associations Data 



27 




^Figure 5; A SEM Model: A Second-order Factor and Method Effects for a MSPAP 
ERJC Longitudinal Associations Data ^ 

O 




I Figure 6: A Hypothesized Multitrait-multimethod Model for the MSPAP-CTBS 
' Correlation Data 




29 



Table 1 .The Intercorrelation Matrix of a Longitudinal Data, Grade 3 MSPAP 1994 with Grade 5 MSP^ 1996 



CO 




yj 

03 




>=■ 

o 

o 




cz 

CO 




Table 2. Factor Loadings of Grade 5 MSPAP 1996 Data, Coefficient Alpha and the Proportion of 
Unique Variance for Each Content Area Test 



CO 

CO 



Q) 














U 














c 




























-H 














u 














nj 


o 












> 


m 














(N 












0) 














C 














-H 














nj 














r— 1 














a 














X 














u 














w 








m 




Q\ 


w 






m 


00 




O 


u 


00 




CN 






• 


w 














w 














w 


Q\ 


o 


o 


c^ 




VO 


w 


00 


00 


CN 


00 




o 


w 














w 




1—1 


Q\ 


Q\ 


CO 


VO 


w 


00 




CN 


00 


1— 1 


1—1 


s: 














w 


m 


o 


O 


CN 


00 


CN 


w 


00 




ro 


CTl 


o 


CN 
















w 














w 


Q\ 


CN 


00 


1—1 


Q\ 


Q\ 






VO 


m 




CN 


O 


s 




• 








• 


w 














w 


VD 




VO 


CN 


1—1 




Q 


00 




CN 


00 


CN 


in 














o 








CO 










01 




Q) 










C 




u 










•H 




c 






(U 




x> 




05 






u 




nj 




-H 






c 




0 




u 






05 








05 


05 




-H 








> 


x: 








M 






a 




05 










rH 


(L) 


> 








0 


< 


U 






0 




U 




c 


05 








u 


4J 


05 


(U 




u 


4J 


u 


c 


-H 






nj 


'H 




0 ) 


u 


< 






1— 1 




•H 


05 








05 




u 


> 


4J 




C 


C 


0 ) 


•H 




c 




0 






M-l 


U 


Q) 




B 


B 


tr 


M-l 


0 


4J 




B 


B 


-H 


0 ) 


u 


C 




0 


0 


e: 


0 


u 


0 




u 


U 


D 


u 


pq 


o 



o 

ERIC 



cv 

CO 



Table 3 

Hypothesis Tests and Fit Indices for Models from LI to L 4 



CO 



pq 


r- 


U) 


ID 


ID 










w 


LTJ 


PO 


'D 


in 










Z 


n 


PO 


rH 


o 




1— 1 






oi 












dj 


















T3 


















0 


















s: 






Pi 


rH 




in 


rH 






PO 




s: 




o> 




CN 




4J 


hQ 


hQ 


Pi 


O 


o 


O 


O 










U) 












P4 


















u 


















QJ 


rH 


rH 


M 


O 


in 


PO 






4J 


QJ 


QJ 


p4 


rH 






GO 




4J 


T3 


T3 


u 


CN 


rH 


GO 


a\ 




QJ 


0 


0 














PQ 


Z 


s: 




rH 


rH 


rH 


rH 






rH 


rH 


a 


O 


O 


O 


O 




a 


O 


O 




O 


O 


O 


O 






O 


O 


M-l 


















T3 


rH 


o 




o 




14H 


PO 






ir> 










T3 


rH 






o 


o 


o 


o 




QJ 








o 


o 


o 


o 




U 






!z; 


ir> 


in 


in 


in 




d 








in 


in 


in 


in 




QJ 




































QJ 






QJ 


PO 


o 


'D 


c\ 




14H 


rH 


o 




rH 


CM 


o 


a\ 




14H 






03 












*H 


GO 


o 


:3 


o> 


o> 


rH 


o 




Q 




rH 


cr 


o> 


o> 


in 








rH 




cn 




CN 


rH 


r' 




QJ 


O 


ID 


1 


in 




r* 






U 


PO 




•H 


fO 


PO 








03 






xi 












d 






u 












cr 


















CO 


















-H 












CO 






xi 












4J 






u 












u 


















Q) 




CO 














MH 




QJ 














MH 


QJ 


u 














pq 


B 


d 
















0 


03 














T3 


CO 


•H 














0 




U 










CO 




xi 


U) 


03 










4J 




4J 


c 


> 










*H 




Q) 


•H 


0 










03 




Z 


(U 


a 










u 






QJ 












E-» 




T3 


u 


u 














C 


14H 


0 










4J 




03 














c 






S 


CO 










Q) 


U 


U 


JQ 


QJ 










4J 


0 


0 




u 










03 


4J 


4J 


PO 


d 












u 


u 


X\ 


03 












03 


03 




*H 










T3 


P4 


P4 


»— 1 


U 










QJ 






03 


03 


d 








4J 


u 


u 


T3 


> 


0 








03 


QJ 


QJ 


0 




CO 








' f— 1 


T3 


T3 


Z 


B 


•H 








QJ 








U 


u 








u 


0 


0 


T3 


0) 


03 








U 


1 


1 


0) 


4J 


a 








0 


T3 


T3 


*H 


1 


B 








u 


C 


d 


MH 


u 


0 


CN 


PO 






0 


0 


*H 


0 


U 




hQ 




X 


U 


U 


T3 


u 








rH 


•H 


03 


03 


0 


u 


rH 


CO 


CO 


<u 


w 


W 


CO 


Z 


pq 


QJ 


> 


> 


T3 












T3 






0 


rH 


CN 


PO 






0 


PO 




s: 








x\ 




Z 


hQ 






Table 4 

Variance Components due to Trait, Method and Error for the first-order Factor Analysis and Variance Components due to the 
Second-order Trait and Residual for the second-order Factor Analysis 



>1 



4J 

•H 



il> 

CO 



nj 



Q) 

Q) 

4J 

M-l 

0 



Q) 



flJ 





13 






























> 














a\ 








rH 




0 




CM 








03 










o 








in 




rH 




rH 




0 








43 








c/3 

<u 






























4J 






































m 




T3 

<U 


































d 






































d 




c3 


































•H 




£ 




cd 






























B 














































H 






























03 




c/3 

04 


































d 




04 

JO 




<u 

TD 






























0 




’ w 
> 


1-^ 

O 


VD 








VO 




CTk 




0 




00 








as 




0 

04 


a 


C 


<J\ 




in 








00 




cn 




cn 








T3 




04 


q 


o 






























03 




vb 




o 






























X 




c/3 




(D 






























•H 




o 


CO 






























MH 




04 


































03 




to 


o 


































cd 


cd 
































43 




3 


Uh 




































to 

§ 


Uh 

d> 


*c3 


















T3 












to 






H 


















P 

4J 












03 






? 


U-) 

a> 


cn 




01 




(U 

01 








CO 




(U 








U 

d 




)-i 

cS 










C 




03 








rH 




u 






0 


03 






c 


o 


•H 




•H 












03 




(3 






VO 


•H 




cd 


o 


1 


T3 




4J 




01 




43 




•H 




(U 








u 






o 


c/3 


03 




•H 




C! 




4J 




u 




•H 








03 






d) 




(U 




M 




03 




03 




0 




u 






C/3 


> 




*C/3 

04 


GO 


£ 






S 




J 




2 




CO 




CO 






•H 




































B 

U 




































cn 


03 








O 






M>4 
















LU 








4J 








fc 


in 


00 


O 




00 


O 


ro 




CM 






<n 






U 




to 




w 


ro 


ro 


ro 




o 


iH 


ro 


ro 


ro 


rH 


rH 


ro 




T3 


0 

U 


































C! 


U 




































03 


03 




*c 






VO 




VO 




VO 




VO 




VO 




VO 


















crt 


crt 


(J\ 


CT\ 


CTl 


CTl 


<T\ 


cn 


cn 


cn 


cn 


cn 




VO 


03 
















5* 


5* 
















cn 


43 




Ui 




O 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 






44 




,0 








p< 


Ql* 


Ql* 


p< 


Cl* 


Cl* 


Cl* 


Cl* 


c^ 


Cl* 


p< 




2 






































p< 


m 




Id 






(N 




C\ 


iH 


iH 


CM 


00 


C\ 


cn 


rH 


VO 


0 






4J 




t/i 




iH 


iH 




CM 




ro 


rH 


ro 


CM 


in 


CM 




d 

03 


d 

03 




i- 1 
































03 


CQ 




*C/3 














0) 


0) 






CO 


CO 










03 




04 


C 




C71 


01 


01 


01 


01 


01 










03 


03 




4J 


u 












c 


c 


03 


03 






rH 


rH 


u 


U 




03 


Q* 




04 

X 






•H 


•H 


•H 


•H 


:i 








03 


03 


{3 


{3 




43 


03 


44 






T3 


T3 


4J 


4J 


01 


01 4 :: 


43 


•H 


•H 


03 


03 






U 


d 




*2 


03 


03 


•H 


•H 


C! 


C! 


4J 


4J 


u 


u 


•H 


•H 




d 




03 


Ci-) 


O 




(U 


03 


u 


u 


03 


03 


03 


03 


0 


0 


u 


u 




0 


5 


•H 


0 


o 


a: 


P:^ 


s 


s 






s: 


X 


CO 


CO 


CO 


CO 




•H 


MH 


u 


04 

f \ 


a 






























4J 


5 


•H 






ro 


in 


CM 


CM 


rH 




cn 


r- 


cn 




0 


rH 




03 




MH 








(M 




in 


iH 




rH 


CM 




CM 


VO 


m 


-"I* 




rH 


rH 


MH 




T3 






























03 

U 

U 


0 


03 

0 

U 


•c 

cd 

> 


9 


c 




























0 


m 




0 


1 


o> 


























03 


u 




0 






VO 




VO 




VO 




VO 




VO 




VO 










04 


cn 


c 


CT\ 


<j\ 


<j\ 


<J\ 


<T\ 


<T\ 


<n 


<n 


cn 


<n 


cn 


<n 


4J 


03 


03 






o 


Q 


Q 


Pt^ 


Pt^ 


CO 


CO 


CO 


CO 


CO 


CO 


U 


u 


0 


43 


43 




pC 




U 




P:^ 


S 


2 




►4 


X, 


X 


CO 


CO 


CO 


CO 


S 


Bh 


Eh 




H 



CC 

CO 



o 

ERIC 



Table S.Multitrait-multimethod Correlation Matrix, among MSPAP, CTBS and OLAST 



cr, 

cn 




yy 




CO 

CO 



o 

ERIC 



Table 6 

Hypothesis Tests and Fit Indices for Models from Ml to M4 



< 


o 


CM 


rH 


rH 














w 


o 


00 


O 


O 














0) 


o 


CM 


CM 


CN 














z. 














































rH 




















Oi 


O 


ID 


CM 


CN 














z 


O 


VO 


ro 


ro 


















O 


o 


o 














0) 
























o 




















M 


o 


O 


ID 


ID 














b 


o 


VO 


VO 


VO 














U 








o\ 
















rH 






















ID 


rH 


rH 


rH 






iH 




iH 


iH 


a 




O 


O 


O 


a 




O 




O 


O 




VO 


O 


O 


O 






o 




O 


o 


M-l 










MH 












'd 


rH 


CN 


(N 


(N 


T3 




VO 




iH 


iH 




O 


O 


O 


O 
















O 


O 


O 


O 














a 


O 


O 


O 


O 


(U 






















u 






















c 






















(U 












(U 
































(U 












fO 










MH 












d 


rH 




(N 


o 


MH 




VO 




iH 


o\ 


tr 


(N 




C\ 


ro 


•H 




CN 






o 


CO 






• 


• 














1 




rH 


o\ 


a\ 






iH 




o\ 


a\ 


-H 




O 


VO 


VO 


(U 




O 




VO 


VO 






o\ 


LD 


LD 






o\ 




LD 


LD 


u 




ro 






d 




ro 














03 




d 


















Ti 


T3 


cr 


















0 


0 


03 


















X 


X 


1 


















4J 


4J 


-H 


















(U 


(U 


X 


















2 


2 


a 


















T3 


T3 




















- (U 


(U 
















CO 




4J 


4J 
















'd 




d 


d 
















0 




rH 


rH 
















s: 




(U 


(U 
















4J 






















(U 






















z 




0 


0 




















a 


a 


















CO 




















(U 


T3 




>1 
















4J 


0 




rH 
















d 


X! 


03 


4J 
















rH 


4J 


4J 


U 








>1 








(U 


(U 


-H 


(U 








4J 








u 


z 


d 


MH 




>1 




•H 








u 




u 


Vh 




4J 




TJ 








0 






(U 




•H 




-H 








a 


(U 




Oi 




TJ 




rH 










4J 








-H 




d 










d 


(U 






rH 




> 










rH 


4J 






d 












03 


(U 


d 


03 




> 




4J 








4J 


Vh 


rH 


4J 








d 








•H 


U 


(U 


•H 




4J 




d 




V 




d 


0 


Vh 


d 




d 




d 


V 


03 




Vh 


u 


Vh 


u 


d 


0) 




•H 


03 


TJ 








0 




0 


03 




B 


4J 


0 








U 




03 


u 




•H 


-H 


X! 




'd 






TJ 


-H 


0) 




U 


d 


4J 




(U 


d 


>1 


0) 


U 


> 




u 


u 


0) 




4J 


4J 


rH 


4J 


d 


d 




03 


4J 


B 




d 


-H 


4J 


d 


a 


0 




-H 








rH 


d 


U 


rH 


B 


o 




Q 








0) 


u 


0) 


0) 


0 




ON 




ro 










MH 


Vh 


u 


4H 


2 


4H 


2 


2 








Vh 


U 




0 




0 






rH 


0 


0 


0) 


0 


rH 




03 




03 


OJ 


(U 


u 




Oi 


U 


0) 


4J 


> 


4J 


> 


> 


TJ 










TJ 


03 




03 






0 


rH 


(N 


ro 




0 


d 


iH 


0) 


iH 


iH 


z 


z 


z 


S 


2 


2 




2 




2 


2 






O 




Variance Components due to Trait, Method and Error for Model Ml 



o: 





T3 






T3 


T3 




(U 


(U 


(U 


Q) 


(U 




X 


(U 


(U 


X 


X 




•H 


Smi 


U 








M-l 


M-l 


M-l 


M-l 


M-l 


2 


00 


r- 




H 


O 


c 


iH 


ro 


ro 


iH 


(N 


w 













o 






VO in ro 

VO ro ro 



ro 



ro 

r- 





VO 


00 


ro 








H 


CN 


ro 


m 


0 




01 


01 


01 








c 


0 


C 








•rH 


-H 


-H 






*ct 


T3 


T3 


T3 


43 


43 




OJ 


fd 


fd 


4J 


4J 


H 


0 ) 


0 ) 


0 ) 


rd 


fd 




0:1 


0:1 


0:1 


S 


s 




01 












C 












•H 












T3 






4^ 






fd 






4J 






(D 


W 


W 


fd 


< 






> 


U 


s 


w 


c 




0:1 


0:1 




s 


<u 


a* 






04 






< 


W 


W 


< 


w 


c 


O 4 


CQ 


PQ 


04 


PQ 


o 


w 


Eh 


Eh 


w 


Eh 


U 


s 


U 


U 


s 


U 



C\; 







The Construct Validity of a Performance-based Assessment Program 



Yuan H. Li, Valeria Ford, Leroy J. Tompkins 



Prince George’s County Public Schools, Maryland 




Paper presented at the annual meeting of the American Educational 
Research Association, April, 19-23, Montreal, Canada 




\ 

U.S. Department of Education 

Office of Educational Research and Improvement (OERI) 
National Library of Education (NLE) 

Educational Resources Information Center (ERIC) 

REPRODUCTION RELEASE 

(Specific Document) 



® 




TM029846 



I. DOCUMENT IDENTIFICATION: 







Author(s); YjACt y\ ' j \/cx lei'' A , Z^ero V 




7 7 

Corporate Source; 


Publication Date; 
^/1 1 



II. REPRODUCTION RELEASE: 



In order to disseminate as widely as possible timely and significant materials of interest to the educational community, documents announced in the 
monthly abstract journal of the ERIC system, Resources in Education (RIE), are usually made available to users in microfiche, reproduced paper copy, 
and electronic media, and sold through the ERIC Document Reproduction Service (EDRS). Credit is given to the source of each document, and, if 
reproduction release is granted, one of the following notices is affixed to the document. 



if permission is granted to reproduce and disseminate the identified document, please CHECK ONE of the following three options and sign at the bottom 
of the page. 



The sample sticker shown below will be The sample sticker shown below will be The sample sticker shown below will be 

affixed to all Level 1 documents affixed to all Level 2A documents affixed to all Level 2B documents 



PERMISSION TO REPRODUCE AND 
DISSEMINATE THIS MATERIAL HAS 
BEEN GRANTED BY 




PERMISSION TO REPRODUCE AND ' 
DISSEMINATE THIS MATERIAL IN 
MICROFICHE. AND IN ELECTRONIC MEDIA 
FOR ERIC COLLECTION SUBSCRIBERS ONLY. 
HAS BEEN GRANTED BY 




PERMISSION TO REPRODUCE AND 
DISSEMINATE THIS MATERIAL IN 
MICROFICHE ONLY HAS BEEN GRANTED BY 










J' 












TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 




TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 




TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 


1 




2A 




2B 




Level 2A 

t 

□ 



Level 2B 

T 

□ 



Check here for Level 1 islease, pehnittlng reproduction 
and dissemination in microfiche or other ERIC archival 
media (e.g., electronic) and paper copy. 



Check here for Level 2B release, permitting 
reproduction and dissemination In microfiche only 



Oocumt 
If permission to re 



I hereby grant to the Educational Resou 
as indicated above. Reproduction fror 
contractors requires permission fmm thi 
to satisfy Information n^ds of educate 



Check here for Level 2A release, permitting reproduction 
and dissemination in microfiche and in electronic media 
cDir^ <vvi<i^iAn subscTlbers Only 

Name: Yuan H. LI 

Address: Prince George's County Public Schools 
Room 205 

Upper Marlboro, MD. 20772 
Tel: 301-952-6764 
Fax: 301-952-6228 

Email: jeffli@pgcps.org ' 







{over) 




III. DOCUMENT AVAILABILITY INFORMATION (FROM NON-ERIC SOURCE): 

If permission to reproduce is not granted to ERIC, or, if you wish ERIC to cite the availability of the document from another source, please 
provide the following information regarding the availability of the document. (ERIC will not announce a document unless it Is publicly 
available, and a dependable source can be specified. Contributors should also be aware that ERIC selection criteria are significantly more 
stringent for documents that cannot be made available through EDRS.) 



Publisher/Distributor: 



Address: 



Price: 



IV. REFERRAL OF ERIC TO COPYRIGHT/REPRODUCTION RIGHTS HOLDER: 

If the right to grant this reproduction release is held by someone other than the addressee, please provide the appropriate name and 
address: 




V. WHERE TO SEND THIS FORM: 



Send this form to the following ERIC Clearinghouse: 

THE UNIVERSITY OF MARYLAND 

ERIC CLEARINGHOUSE ON ASSESSMENT AND EVALUATION 
1129 SHRIVER LAB, CAMPUS DRIVE 
COLLEGE PARK, MD 20742-5701 
Attn: Acquisitions 



However, if solicited by the ERIC Facility, or if making an unsolicited contribution to ERIC, return this form (and the document being 
contributed) to: 

ERIC Processing and Reference Facility 
1100 West Street, 2"“ Floor 
Laurel, Maryland 20707-3598 

Telephone: 301-497-4080 
Toll Free: 800-799-3742 
FAX: 301-953-0263 
e-mail: ericfac@inet.ed.gov 

^ WWW: http://ericfac.piccard.csc.com 

ERIC -088 (Rev. 9/97) 

““WfuiVIOUS VERSIONS OF THIS FORM ARE OBSOLETE. 



